Word Sense Disambiguation Corpora Acquisition via Confirmation Code
نویسندگان
چکیده
Word Sense Disambiguation (WSD) is one of the fundamental natural language processing tasks. However, lack of training corpora is a bottleneck to construct a high accurate all-words WSD system. Annotating a large-scale corpus by experts costs enormous time and financial resources. Human Computation is a novel idea for integrating human resources behind the Web, which has been wasted, to solve practical problems that are difficult for computers. Based on human computation, we design a confirmation code system, which can not only distinguish between human beings and computers (the function of normal confirmation code system), but also annotate WSD corpora. The preliminary experimental result shows that the proposed method can annotate large-scale and high-quality WSD corpora within a short time. To the best of our knowledge, this is the first attempt to use confirmation code in natural language processing for corpora acquisition.
منابع مشابه
Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
متن کاملTranslation Selection through Source Word Sense Disambiguation and Target Word Selection
A word has many senses, and each sense can be mapped into many target words. Therefore, to select the appropriate translation with a correct sense, the sense of a source word should be disambiguated before selecting a target word. Based on this observation, we propose a hybrid method for translation selection that combines disambiguation of a source word sense and selection of a target word. Kn...
متن کاملAutomatic Acquisition of Sense Tagged Corpora
An important problem in Natural Language Processing is identifying thecorrect sense of a word in a particular context. Thus far, statistical methods have been considered the best techniques in word sense disambiguation. Unfortunately, these methods produce high accuracy results only for a small number of preselected words. The reduced applicability of statistical methods is due basically to the...
متن کاملComparing methods for automatic acquisition of Topic Signatures
The main goal of this work is to compare two methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever and Infomap, for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. Both systems construct a query for each word sense using WordNet. ...
متن کاملUsing WordNet Lexical Database and Internet to Disambiguate Word Senses
The term “knowledge acquisition bottleneck” has been used in Word Sense Disambiguation Tasks (WSDTs) to illustrate/express the problem of the lack of large tagged corpora. In this paper, an automated WSDT is based on text corpora extracted / collected from Internet web pages. First, the disambiguation for the sense of a word, in a context, is based on the use of its definition and the definitio...
متن کامل